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Abstract 


-'i  v'  /  ' 

J  in  this  paper  we  show^ow  to  take  personally  assessed  information  and  use  it 
to  develop  a  continuous  unimodal  prior  density  function,  perhaps  for  subsequent 
Bayesian  analysis.  The  method  is  completely  nonparametric  and  uses  only  the 
furnished  information  and  no  other.  The  technique  is  easily  computerized,  and 
yields  a  closed  analytical  formula  for  the  prior.  The  resulting  distribution  may  be 
considered  to  be  an  inferential  distribution. 


Keywords:  Prior  distribution  assessment,  unimodality,  information  theory, 
maximum  entropy 


1.  Introduction 


A  basic  rule  in  statistical  analysis  is  to  use  all  the  information  which  you 
have,  but  avoid  using  any  information  which  you  don’t  have.  The  desire  to  use  the 
prior  belief  or  experiences  of  the  scientist  as  information  capable  of  being  input  into 
the  statistical  analysis  has  led  to  the  branch  of  Bayesian  statistics,  and  the  same 
goal  has  also  led  to  information-theoretic  statistical  analysis.  One  crucial  problem 
which  must  be  addressed  in  applying  any  Bayesian  methods  in  real  situations  in 
business,  medicine,  and  other  fields  is  how  to  take  the  information  supplied  by  the 
client  (or  the  scientist  himself)  and  obtain  an  inferential  (or  prior)  distribution  for 
the  stochastic  phenomenon  understudy.  In  particular,  an  inferential  distribution 
must  be  found  in  order  to  update  expectations  (and  find  Bayes  estimators),  keeping 
in  mind  that  you  should  avoid  using  information  which  you  don’t  have.  In  real 
applications  you  should  not  necessarily  assume  a  parametric  prior  (such  as  a  normal 
prior)  and  just  proceed  to  estimate  parameters  unless  the  distributional  model  has 
been  given  as  part  of  the  information.  To  paraphrase  Albert  Einstein,  the  model 
should  be  as  simple  as  possible,  but  no  simpler. 

In  this  paper  we  address  the  topic  of  inferential  density  assessment  when  we 
know  the  prior  density  is  unimodal.  We  quantify  the  amount  of  information  in  a 
statistical  density  by  using  the  information-theoretic  techniques,  and  we  show 
explicitly  how  to  "use  all  the  information  available  (including  unimodality)  and  no 
other”.  Our  technique  involves  transforming  the  problem  from  the  original 
unimodal  stochastic  variable  to  an  auxiliary  variable.  We  then  estimate  the  density 
for  the  auxiliary  variable  using  minimum  discrimination  information  subject  to  the 
constraints  obtained  about  the  original  variable.  The  detailed  formulae  are  given  in 
the  next  several  sections.  The  result  is  easily  computerized  so  that  the  user  need 
only  input  a  few  basic  characteristics,  and  the  computer  then  outputs  a  graphical 
and  also  analytical  representation  of  the  desired  density. 


2.  Information  Theoretic  Density  Estimation 


In  information  theoretic  notation  the  expected  amount  of  information  in  an 
observation  X  for  distinguishing  between  two  density  functions  f  and  g  is  denoted  by 
I(flg).  Mathematically  this  expected  information  is  quantified  by  the  expected  value 
of  the  difference  in  log-odds  ratio  or  Kullback-Leibler  number,  viz, 

fix) 


(2.1) 


fix)  In 


gi-x) 


X  (dx) 


where  X  is  some  dominating  measure  for  f  and  g.  We  shall  call  I(llg)  the 
informational  divergence  between  f  and  g.  If  g(x)  =  I  for  all  x,  then  I(flg)  represents 
me  informational  divergence  between  the  postulated  density  f  and  the  completely 
uninformative  density  g.  In  this  case  Kflg)  is  precisely  minus  the  entropy  of  f.  and  is 
a  measure  of  the  uncertainty  of  the  density  f.  In  practice,  A(dx)  is  selected  as 
Lebesgue  measure  in  the  (absolutely)  continuous  case,  and  as  counting  measure  in 
the  discrete  case. 

If  we  are  given  certain  generalized  moment  constraints  which  the  density  f 
must  satisfy,  such  as 


fix)  X  (dx) 


6^  =  I  h^(x)fix)X(dx) 


(2.2) 


ix)f(x)X(dx) 


then  the  minimum  distrimination  information  (MDI)  estimate  of  the  density  g 
subject  to  the  constraints  (2.2)  is  defined  to  be  the  minimum  of  (2.1)  over  all  f  subject 
to  (2.2).  If  g(x)  =  1,  then  this  density  estimate  is  called  the  maximum  entropy  density 
(M.E.)  subject  to  (2.2).  This  density  is  the  least  informative  distribution  possible 
subject  only  to  the  constraints  (2.2).  If  we  are  to  use  only  that  information  given  in 
(2.2),  and  no  other  information,  then  the  M.E.  density  is  implied.  This  constrained 
M.E.  density  estimation  may  be  construed  as  a  useful  extension  of  Laplace's  famous 
"principle  of  insufficient  reason”  which  postulates  a  uniform  distribution  in  the 
situation  in  which  no  knowledge  is  available.  Here,  when  information  of  the  form 


(2.2)  is  available,  we  select  that  distribution  which  is  as  close  to  uniform  as  possible 
subject  to  the  given  constraints  (2.2).  Of  course  other  "goal  densities”  g  may  be  more 
appropriate  in  other  situations. 


The  explicit  calculation  of  the  MDI  density  subject  to  (2.2)  is  easily  carried  out 
using  Lagrange  Multipliers.  Introducing  a  multiplier  for  constraint  i,  we  wish  to 
maximize  — I(flg).  We  have 


-J(f\g)=  fix)  In  Afc/x)-  V  -1.6. 


=  1  fix)  /n  [  ^- —  —  'y  a.h.ix)  Xidx) 

J  L  fix)  ‘  ‘ 
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j 


<*p|  -  Z 


^  fix) 


g(x)  -  1  A  (dx) 


where  hg(x)  =  1.  The  inequality  follows  since  lnx<  x  —  1  with  equality  only  when 
X  =  1.  Thus  the  above  inequality  becomes  an  equality  when 


fCx)  =  exp 


Summarizing,  the  MDI  density  subject  to  the  constraints  (2.2)  is  precisely 


* 

fix)  =  exp  ~  ^  ix)  gix) 


where  hg(x)  =  1,  and  the  constants  a,  i  =  0, 1,  ...,k  are  found  by  solving  the  moment 
constraints  (2.2)  simultaneously. 


An  easier  method  for  determining  the  actual  numerical  values  for  a.  can  be 
derived  from  the  results  in  Brockett,  Charnes  and  Cooper  ( 1980),  or  in  Charnes, 
Cooper  and  Seiford  (1978).  There  it  is  shown  that  the  problem  of  minimizing  (2.1) 
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subject  to  (2.2)  is  a  constrained  strictly  convex  programming  problem  with  an 
unconstrained  dual  convex  program  involving  only  exponential  and  linear  terms. 
Moreover,  the  desired  a^,  i  =  0, 1, ...,  k  are  precisely  the  dual  variables  and  may  easily 
be  obtained  on  a  computer  by  any  of  a  number  of  nonlinear  programming  codes  (e.g., 
the  SUMT  method  code  of  Garth  McCormick  and  Charles  Mylander).  This  duality 
relationship  is  exploited  in  the  final  section  where  numerical  examples  are  given. 
Thus  the  entire  process  of  obtaining  the  MDI  density  is  easily  computerized. 

3.  Exploiting  Knowledge  About  Unimodality 

Most  people  when  forecasting  or  estimating  a  prior  distribution  will  have  a 
unimodal  density  in  mind.  Consequently  the  M.E.  density  obtained  in  section  2  will 
be  rejected  by  most  decision  makers,  since,  except  for  certain  serendipitous 
situations  concerning  the  relationships  among  the  functions  h|(x)  and  the  bj’s.  the 
M.E.  density  cannot  be  guaranteed  to  be  unimodal.  A  bi-  or  tri-modal  density  is  hard 
to  justify  to  the  decision  maker  in  many  (but  not  all)  situations.  Accordingly,  in  this 
section,  we  show  how  to  incorporate  the  knowledge  that  the  prior  density  for  the 
decision  maker  is  unimodal.  This  technique  is  of  independent  interest  in  improving 
many  procedures  to  incorporate  unimodality.  We  shall  show  how  to  obtain  an 
inferential  (or  prior)  distribution  which  is  as  uninformative  as  possible  subject  to 
being  unimodal  and  satisfying  the  constraints  (2.2) 

As  a  concrete  example  let  us  suppose  we  have  elicited  from  the  decision  maker 
the  following  information  concerning  an  unknown  prior  variable  9:  (these  may  be  as 
the  result  of  sales  forecasts  for  example). 

1)  The  prior  density  for  0  is  unimodal  with  the  most  likely  value  0^. 

2)  The  range  of  possibilities  for  e  is  a  to  b. 

3)  The  decision  maker  will  give  even  odds  that  0  is  between  two  numbers  Sj 
and  aj.  (This  will  give  a  measure  of  dispersion  for  the  desired  density. 
Prescribing  the  25^*’  and  TS'**  percentiles  is  another  usual  vehicle  for 
obtaining  this  sort  of  measure.) 

4)  The  decision  maker  assesses  the  chance  of  0  falling  short  of  0^  as  p.  (This 
will  give  a  measure  of  skewness  for  the  prior  density). 

These  constraints  translate  into: 
i)  0  is  unimodal  with  mode  0^ 


and 


!?•  W  <■  WAV 
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'r'T'5' 


I 
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ii) 


(3.1) 


where 


I  =  6^=  I  fio(d)f{d)de 

■*  a 

.5  =  6j=  I  h^(9)flO)de 
P  =  b^=  I  h.^(e)f{6)de 


rl  ifOi  (a,.a.J 

hje)  =  i,  /t,(0)  = 
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and 
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1  do  s 
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Of  course  other  possible  constraints  (such  as  other  assessed  percentiles)  may  be 
added  if  the  statistician  desires. 


The  problem  addressed  in  this  section  is  how  the  Bayesian  statistician  ( or 
more  generally,  the  scientist  desiring  an  inferential  distribution)  can  use  the 
unimodality  information  in  a  constructive  manner. 

If  9  is  a  unimodal  variable  with  mode  0„,  then  the  density  satisfies  f'(0)^O  for 
0<0g  and  f(0)<O  for  0>0g.  Accordingly  —  (0  — 0p)r(0)>O  for  all  0,  and  so  —  (9  — 0q)r(0) 
is  proportional  to  the  density  of  some  random  variable,  X.  This  is  equivalent  to  the 
decomposition 

(3.2)  e-Q^=U  X 

where  U  is  uniform  over  [0,1]  and  independent  of  the  random  variable  X,  which,  of 
course,  is  just  L.  Shepp’s  reformulation  of  Khinchin’s  famous  characterization  of 
unimodal  random  variables  (cf.  Feller  (1971)  page  158). 

Since  knowledge  that  0  is  0g  -  unimodal  is  completely  equivalent  to  the 
existence  of  the  decomposition  (3.2)  we  know  such  a  random  variable  X  must  exist. 
However  we  are  ignorant  about  its  precise  form  except  for  the  constraints  upon  X 
which  are  implied  by  the  known  constraints  (2.2)  on  0.  By  using  the  decomposition 
(3.2),  together  with  conditional  expectation  given  X  we  obtain 

(3.3)  E[h(e)\  =  E[h*(X)] 


where 


h*(x)  =  E[h(UX  +  8q)\X=x]=  j  j  h(t  +  d^)dt. 

The  relationship  (3.3)  allows  us  to  transform  the  moment  constraints  (2.2)  on  0 
into  moment  constraints  of  the  form 

6 .  =  I  hAx)f^(x)dx,  t  =  0, 1, 2. 

upon  X.  In  this  regard  the  transformation  (3.2)  is  in  the  spirit  of  Kemperman  (1971). 
As  an  illustration,  the  transformed  constraints  (3.1)  become 


1=60  = 


h^(x)fj^(.x)dx 


(3.4) 


5  =  ^  = 


ix)  f^{x)X(dx) 


where 


and 


=  j*;D 


Ct)  (x)  A  (dx) 


*  1  f* 

/io(x)=  -  I  dt=l, 

J  0 


r 


ifx  < 


“2-^0 


-  “2  - 


h^(x)  = 


1  ifx  <  0 


‘0  ifx  >  0 

The  problem  of  constructing  an  inferential  (or  prior)  distribution  for  0  subject 
to  the  constraints  (3.1)  and  unimodality  has  now  been  transformed  via  (3.2)  and  (3.3) 
into  the  problem  of  constructing  a  density  function  for  X  subject  to  the  constraints 
(3.4),  By  unimodality  of  0  we  know  X  exists,  however  we  have  no  information 
concerning  X  other  than  the  fact  that  it  satisfies  (3.4).  Accordingly,  we  may  use  the 


extension  of  Laplace’s  principle  of  insufficient  information  to  postulate  a  maximum 
entropy  density  for  X.  We  obtain  (from  (2.3)) 


=  exp  I  -  ^  a-h.  (x)  forx  €  [a  —6^,  b-d^], 


and  again  using  the  relation  (3.2)  the  original  density  for  0  becomes 


f(d)  = 


1-0,  'Ar'^’rn 


If®  dx 

ty(x)  —  ifd  >  9 
i  a  _  a  A  I  /  0 

0 

The  constants  {a.}  needed  to  determine  (3.5)  and  hence  (3.6)  are  found  using  the 
unconstrained  dual  formulation  discussed  previously  which  is  explicitly  stated  in  the 
final  numerical  example  section. 

The  computer  program  to  implement  this  analysis  should  also  plot  the 
obtained  inferential  or  prior  density  using  the  construction  (3.6).  One  might  use  this 
graph  for  consultation  with  the  decision  maker.  If  more  information  is  available  or 
needs  to  be  supplied  for  decision  making  purposes,  additional  constraints  are  added 
to  (3.1),  transformed  into  new  constraints  on  X  via  (3.3)  and  added  to  the  constraint 
set  (3.4).  Such  supplementation  can  be  continued  until  the  decision  maker  when 
presented  with  the  graphical  density  representation  is  satisfied  with  the  inferential 
density  obtained. 

4.  Numerical  Illustrations 

In  this  section  we  shall  exhibit  the  numerical  results  of  implementing  the 
previous  procedure.  From  the  duality  theory  given  in  Charnes,  Cooper  and  Seiford 
(1978),  the  dual  to  the  primal  problem  of  minimizing  (2.1)  subject  to  the  constraints 
(2.2)  is  the  unconstrained  convex  programming  problem 


k  r  ^ 

^  a  b.  -  g(x)exp{-  X  n  /i  (.t)  )A  (dx). 

: r»  ^  _  rt 


In  the  unimodal  estimation  problem  considered  in  the  previous  section,  one 
estimates  the  parameters  {a|}  in  the  density  fj^(x)  by  solving  (4.1)  with  h;*  replacing 
hj,  and  A(dx)  =  dx. 

One  further  point  should  be  made  here.  If  one  wishes  to  impose  a  continuity 
constraint  upon  the  density  f^  at  the  mode  then,  from  equation  (3.6),  such  a 
constraint  on  0  translates  into  a  moment  constraint  upon  the  auxiliary  variable  X  of 
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the  form 
(4.2) 


0  = 


h\^^(x)f^(x)dx 


where  h*^^j(x)  =l/x(there  is  no  corresponding  h^  ^ ,  constraint  on  0).  Similarly, 
smoothness  constraints  upon  the  density  near  the  mode  0g  can  be  accomplished  by 
choosing  ’’goal  densities”  g(x)  in  the  estimation  of  the  auxiliary  density  f^  which  are 
sufficiently  smooth  as  x-»0q.  We  shall  illustrate  this  with  three  goal  densities.  Up  to 
the  appropriate  normalizing  constant  these  are  gj(x)  =  1  (corresponding  to  maximum 
entropy  estimation  for  f^) 


and 


g2(x)  = 


1 


2 

exp{-t[xi  -5) 

1 


{  — 
1*1 


5 

for  |l|2  6 


W  =  exp{—x^l  2o^)/  V'2TTb 


The  goal  density  gg  behaves  like  the  constant  1  outside  |xj  <6,  and  dips  smoothly  to 
zero  as  |x|'*0.  This  goal  density  approximates  the  maximum  entropy  procedure 
given  by  gj,  but  constrains  the  resulting  prior  density  f^  to  be  smooth  at  the  mode. 
The  goal  density  g3(x)  corresponds  to  the  f^^  density  which  would  result  from  being 
normally  distributed,  and  hence  this  goal  density  gives  the  "close  to  normality 
subject  to  constraints”  result  for  the  estimated  prior  density  fg.* 

Figure  1  shows  the  resulting  prior  distributions  obtained  using  each  of  these 
goal  densities  and  using  only  the  following  client  furnished  information  concerning 
0: 


I 

1 


I 


1.  9  is  unimodal  with  possible  values  between  0  and  10 

2.  the  most  likely  value  for  0  is  3 

3.  there  are  even  odds  that  the  value  of  0  lies  between  1  and  5 


*The  continuity  and  smoothness  at  the  mode  is  not  guaranteed  for  asymmetric  density  using  the  goal  density  g^i  x ).  due  to  the 
symmetry  of  the  normal.  To  impose  the  smoothness  and  the  "close  to  normality  subject  to  constraints"  interpretation,  the 
product  of  gjix)  and  gg(s)  can  be  used  as  a  goal  density.  Figure  1  shows  this  case. 


■> 
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4.  there  is  a  30%  chance  that  0  will  fall  short  of  the  most  likely  value  of  3. 
(INSERT  FIGURE  1  HERE) 

It  can  be  seen  that  the  parameter  6  in  the  goal  g.,  serves  as  a  smoothing 
parameter. 

For  a  second  illustration,  assume  we  have  the  following  information; 

1.  0  is  unimodal  with  possible  values  between  0  and  10 

2.  The  most  likely  value  for  0  is  5 

3.  there  are  even  odds  that  the  value  of  0  is  between  4  and  6 

4.  the  distribution  of  0  is  symmetric  about  5 
(INSERT  FIGURE  2  HERE) 

Figure  2  shows  the  results  of  the  calculations  in  this  second  situation  for  each 
of  the  goal  densities. 

As  a  final  note,  we  should  remark  that  the  technique  described  in  this  paper 
can  also  be  extended  into  a  new  technique  for  non- parametric  unimodal  density 
estimation  using  actual  data  and  prior  information.  We  develop  this  non- parametric 
unimodal  density  estimation  technique  in  Brockett,  Charnes  and  Paick  ( 19831. 
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